-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long living client #108
Long living client #108
Conversation
* Backport #52643 Backporting changes from CH #52643 https://github.com/ClickHouse/ClickHouse/pull/52643/files * Update positional options in Client Backporting changes from CH #52643 https://github.com/ClickHouse/ClickHouse/pull/52643/files
Add legal disclaimer
(cherry picked from commit ccd5c3fa0c3bd9891089e3bcac0191886a18be48)
(cherry picked from commit fef0cd4695aa77280f13c030bb2a98b22f1702cb)
(cherry picked from commit f91fd51e199d814730bd70646a6461f23fa40be9)
(cherry picked from commit 3430ed8c527dc0163b701ca8ca810f57ca105ad7)
(cherry picked from commit 538f0b4018b927044c887c745af4bd08f460a5ef)
(cherry picked from commit 63f865693a2144e9b7a58a71b1b8f7bab0880fc7)
Also in case it's useful, I use this in Rust through this binding: use std::ffi::{CStr, CString};
use std::os::raw as libc;
use log::debug;
use crate::Error::GenericError;
#[repr(C)]
struct chdb_local_result {
buf: *mut libc::c_char,
len: usize,
error_message: *mut libc::c_char,
}
#[repr(C)]
struct chdb_init_result
{
local_server: *mut libc::c_void,
error_message: *mut libc::c_char,
}
#[link(name = "chdb")]
extern "C" {
fn chdb_free_result(local_server: *mut libc::c_void, result: *mut chdb_local_result);
fn chdb_connect(argc: i32, argv: *const *const libc::c_char) -> *mut chdb_init_result;
fn chdb_disconnect(local_server: *mut libc::c_void);
fn chdb_query(
local_server: *mut libc::c_void,
query: *const libc::c_char,
format: *const libc::c_char,
) -> *mut chdb_local_result;
}
pub struct RawChdbClient {
local_server: *mut libc::c_void,
}
impl RawChdbClient {
pub fn connect(path: &str) -> crate::Result<RawChdbClient> {
let argv: [*const libc::c_char; 3] = [
CString::new("clickhouse").unwrap().into_raw(),
CString::new("--multiquery").unwrap().into_raw(),
CString::new(format!("--path={}", path)).unwrap().into_raw(),
];
let result = unsafe { chdb_connect(3, argv.as_ptr()) };
unsafe {
drop(CString::from_raw(argv[0] as *mut libc::c_char));
drop(CString::from_raw(argv[1] as *mut libc::c_char));
drop(CString::from_raw(argv[2] as *mut libc::c_char));
}
if result.is_null() {
debug!("connect returned nullptr");
return Err(GenericError("Could not connect to chdb".to_string()));
}
unsafe {
if !(*result).error_message.is_null() {
let error = CStr::from_ptr((*result).error_message).to_string_lossy().into_owned();
debug!("result has error message: {}", error);
return Err(GenericError(error));
}
}
unsafe {
if (*result).local_server.is_null() {
return Err(GenericError("Could not connect to chdb, nullptr returned".to_string()));
}
}
Ok(RawChdbClient { local_server: unsafe { (*result).local_server } })
}
pub fn query(&self, query: &str, format: &str) -> crate::Result<Option<Vec<u8>>> {
debug!("Rust: starting chdb query process for query: {}", query);
let query_c = CString::new(query).unwrap();
let format_c = CString::new(format).unwrap();
let result = unsafe {
chdb_query(self.local_server, query_c.as_ptr(), format_c.as_ptr())
};
debug!("Rust: running query done.");
if result.is_null() {
debug!("result is null");
return Ok(None);
}
unsafe {
if !(*result).error_message.is_null() {
let error = CStr::from_ptr((*result).error_message).to_string_lossy().into_owned();
chdb_free_result(self.local_server, result);
debug!("result has error messsage: {}", error);
return Err(GenericError(error));
}
}
unsafe {
if (*result).buf.is_null() {
chdb_free_result(self.local_server, result);
return Err(GenericError("Unknown error occurred".to_string()));
}
}
let slice = unsafe { std::slice::from_raw_parts((*result).buf as *const u8, (*result).len) };
let output = slice.to_vec();
unsafe { chdb_free_result(self.local_server, result) };
Ok(Some(output))
}
pub fn disconnect(&self) {
unsafe {
chdb_disconnect(self.local_server)
}
}
} Where #[derive(Debug, thiserror::Error)]
pub enum Error {
#[error("{0}")]
GenericError(String),
// More errors here...
}
pub type Result<T> = std::result::Result<T, Error>; |
@Dennitz thanks! 💪 This is useful. We'll most likely use this draft as a reference to reimplement the changes! |
Sure, once the other changes make it into chdb, I can do that. |
One important thing to note: This implementation is not thread safe. So all |
|
the main branch is rebased, will try to use interactive clickhouse-local to optimize chdb later |
Done in #283 |
As requested in Discord, adding my draft implementation for a long-living client.
This branch is based off an older revision, e503e5b to be precise. So there are conflicts that would need to be resolved.
Also, the code for the chdb Python library is currently commented out, as I'm only using libchdb personally and wanted to avoid compile errors.
Please see this PR as a starting point or just as inspiration, and feel free to make any necessary changes that would be needed to get this integrated.
The main difference of my fork is that it allows creating a “chdb instance” (a LocalServer object), and reusing that for multiple queries.
The difference can probably be seen best based on the new interface:
You'd call
chdb_connect
once which returns a pointer to a client. You'd then callchdb_query
(each time followed bychdb_free_result
) any number of times, passing in the pointer to that client. And at the end callchdb_disconnect
.I implemented this mainly because it means queries finish quite a bit faster in certain cases. But a nice side effect is that this also adds support for stateful queries like USE some_db.
How do queries get faster? I noticed that the cleanup function of LocalServer can take quite a bit of time. And this gets amplified with each additional created MergeTree table. In my tests (on M1 Mac), each (empty) MergeTree table adds about 0.15s of cleanup time, even when the table isn’t actually used in the respective query. This happens also in clickhouse-local, so it’s not specific to chdb.
The cleanup function normally gets executed after each query before returning results in chdb. In the fork it gets executed only when destroying the instance, i.e. when calling chdb_disconnect. So on each individual query, there is no need to wait for cleanup.
I guess this is essentially the same difference as between running
clickhouse local --path . --query "select 1"
(cleanup right after the query), and running the query in an interactive session started withclickhouse local --path .
(cleanup only when killing the session).This PR (unfortunately) also contains some other changes, as it would have been hard to separate those now.
The other changes are:
error_message
field to the result (a similar thing also seems to be implemented in Add has_error and error_message methods #105 now)-DENABLE_LIBPQXX=1
, as I needed it in my project. I guess you might want to disable it though